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AMENDMENTS TO THE CLAIMS 



The following listing of claims replaces all prior versions of claims in the application. 

1. (Previously presented): A robotics visual and auditory system comprising; 
a plurality of acoustic models, 

a speech recognition engine for executing speech recognition processes to separated sound 
signals from respective sound sources by using the acoustic models, and 

a selector for integrating a plurality of speech recognition process results obtained by the 
speech recognition process, and selecting any one of speech recognition process results, 

wherein, in order to respond the case where a plurality of speakers speak to said robot from 
different directions with the robot's front direction as the base, the acoustic models are provided 
with respect to each speaker and each direction so to respond each direction, 

wherein the speech recognition engine uses each of said acoustic models separately for one 
sound signal separated by sound source separation, and executes said speech recognition process in 
parallel. 

2. (Previously presented): A robotics visual and auditory system as set forth in Claim 1, 
wherein the selector calculates the cost function value, upon integrating the speech recognition 
process result, based on the recognition result by the speech recognition process and the speaker's 
direction, and judges the speech recognition process result having the maximum value of the cost 
function as the most reliable speech recognition result. 

3. (Original): A robotics visual and auditory system as set forth in Claim 1 or Claim 2, 
wherein it is provided with a dialogue part to output the speech recognition process results selected 
by the selector to outside. 

4. (Previously presented): A robotics visual and auditory system comprising; 

an auditory module which is provided at least with a pair of microphones to collect external 
sounds, and, based on sound signals from the microphones, determines a direction of at least one 
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speaker by sound source separation and localization by grouping based on pitch extraction and 
harmonic sounds, 

a face module which is provided a camera to take images of a robot's front, identifies each 
speaker, and extracts his face event from each speaker's face recognition and localization, based 
on images taken by the camera, 

a motor control module which is provided with a drive motor to rotate the robot in the 
horizontal direction, and extracts motor event, based on a rotational position of the drive motor, 

an association module which determines each speaker's direction, based on directional 
information of sound source localization of the auditory event and face localization of the face 
event, from said auditory, face, and motor events, generates an auditory stream and a face stream 
by connecting said events in the temporal direction using a Kalman filter for determinations, and 
further generates an association stream associating these streams, and 

an attention control module which conduct an attention control based on said streams, and 
drive-controls the motor based on an action planning results accompanying the attention control, 

in order for the auditory module to respond the case where a plurality of speakers speak to 
said robot from different directions with the robot's front direction as the base, acoustic models are 
provided in each direction so to respond each speaker, and each direction, 

wherein the auditory module collects sub-bands having interaural phase difference (IPD) 
or interaural intensity difference (IID) within a predetermined range by an active direction pass 
filter having a pass range which, according to auditory characteristics, becomes minimum in the 
frontal direction, and larger as the angle becomes wider to the left and right, based on an accurate 
sound source directional information from the association module, and conducts sound source 
separation by restructuring a wave shape of a sound source, conducts speech recognition in parallel 
for one sound signal separated by sound source separation using a plurality of the acoustic models, 

integrates speech recognition results from each acoustic model by a selector, and judges the most 
reliable speech recognition result among the speech recognition results. 
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5. (Previously presented): A robotics visual and auditory system comprising; 

an auditory module which is provided at least with a pair of microphones to collect external 
sounds, and, based on sound signals from the microphones, determines a direction of at least one 
speaker by sound source separation and localization by grouping based on pitch extraction and 
harmonic sounds, 

a face module which is provided a camera to take images of a robot's front, identifies each 
speaker, and extracts his face event from each speaker's face recognition and localization, based 
on images taken by the camera, 

a stereo module which extracts and localizes a longitudinally long matter, based on a 
parallax extracted from images taken by a stereo camera, and extracts stereo event, 

a motor control module which is provided with a drive motor to rotate the robot in the 
horizontal direction, and extracts motor event, based on a rotational position of the drive motor, 

an association module which determines each speaker's direction, based on directional 
information of sound source localization of the auditory event and face localization of the face 
event, from said auditory, face, stereo, and motor events, generates an auditory stream, a face 
stream and a stereo visual stream by connecting said events in the temporal direction using a 
Kalman filter for determinations, and further generates an association stream associating these 
streams, and 

an attention control module which conduct an attention control based on said streams, and 
drive-controls the motor based on an action planning results accompanying the attention control, 

in order for the auditory module to respond the case where a plurality of speakers speak to 
said robot from different directions with the robot's front direction as the base, acoustic models are 
provided in each direction so to respond each speaker, and each direction, 

wherein the auditory module collects sub-bands having interaural phase difference (IPD) 
or interaural intensity difference (IID) within a predetermined range by an active direction pass 
filter having a pass range which, according to auditory characteristics, becomes minimum in the 
frontal direction, and larger as the angle becomes wider to the left and right, based on an accurate 
sound source directional information from the association module, and conducts sound source 
separation by restructuring a wave shape of a sound source, conducts speech recognition in parallel 
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for one sound signal separated by sound source separation using a plurality of the acoustic models, 
integrates speech recognition results from each acoustic model by a selector, and judges the most 
reliable speech recognition result among the speech recognition results. 

6. (Original): A robotics visual and auditory system as set forth in Claim 4 or Claim 5, 
characterized in that; 

when the speech recognition by the auditory module failed, the attention control module is 
made up as to collect speeches again from the microphones with the microphones and the camera 
turned to the sound source direction of the sound signals, and to perform again speech recognition 
of the speech by the auditory module, based on the sound signals conducted sound source 
localization and sound source separation. 

7. (Original): A robotics visual and auditory system as set forth in Claim 4 or Claim 5, 
characterized in that; 

the auditory module refers to the face event from the face module upon performing the 
speech recognition. 

8. (Original): A robotics visual and auditory system as set forth in Claim 5, characterized 

in that; 

the auditory module refers to the stereo event from the stereo module upon performing the 
speech recognition. 

9. (Original). A robotics visual and auditory system as set forth in Claim 5, characterized in 

that; 

the auditory module refers to the face event from the face module and the stereo event from 
the stereo module upon performing the speech recognition. 
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10. (Original): A robotics visual and auditory system as set forth in Claim 4 or Claim 5, 
wherein it is provided with a dialogue part to output the speech recognition results judged by the 
auditory module to outside. 

11. (Original): A robotics visual and auditory system as set forth in Claim 4 or Claim 5, 
wherein a pass range of the active direction pass filter can be controlled for each frequency. 

12. (Original): A robotics visual and auditory system as set forth in Claim 4 or Claim 5, 
wherein the selector calculates the cost function value, upon integrating the speech recognition 
result, based on the recognition result by the speech recognition and the direction determined by 
the association module, and judges the speech recognition process result having the maximum 
value of the cost function as the most reliable speech recognition result. 

13. (Original): A robotics visual and auditory system as set forth in Claim 4 or Claim 5, 
characterized in that; it recognizes the speaker's name based on the acoustic model utilized to 
obtain speech recognition result. 
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